19 research outputs found

    Estimation and Regularization Techniques for Regression Models with Multidimensional Prediction Functions

    Get PDF
    Boosting is one of the most important methods for fitting regression models and building prediction rules from high-dimensional data. A notable feature of boosting is that the technique has a built-in mechanism for shrinking coefficient estimates and variable selection. This regularization mechanism makes boosting a suitable method for analyzing data characterized by small sample sizes and large numbers of predictors. We extend the existing methodology by developing a boosting method for prediction functions with multiple components. Such multidimensional functions occur in many types of statistical models, for example in count data models and in models involving outcome variables with a mixture distribution. As will be demonstrated, the new algorithm is suitable for both the estimation of the prediction function and regularization of the estimates. In addition, nuisance parameters can be estimated simultaneously with the prediction function

    Geoadditive Regression Modeling of Stream Biological Condition

    Get PDF
    Indices of biotic integrity (IBI) have become an established tool to quantify the condition of small non-tidal streams and their watersheds. To investigate the effects of watershed characteristics on stream biological condition, we present a new technique for regressing IBIs on watershed-specific explanatory variables. Since IBIs are typically evaluated on anordinal scale, our method is based on the proportional odds model for ordinal outcomes. To avoid overfitting, we do not use classical maximum likelihood estimation but a component-wise functional gradient boosting approach. Because component-wise gradient boosting has an intrinsic mechanism for variable selection and model choice, determinants of biotic integrity can be identified. In addition, the method offers a relatively simple way to account for spatial correlation in ecological data. An analysis of the Maryland Biological Streams Survey shows that nonlinear effects of predictor variables on stream condition can be quantified while, in addition, accurate predictions of biological condition at unsurveyed locations are obtained

    Glatiramer Acetate Treatment Normalizes Deregulated microRNA Expression in Relapsing Remitting Multiple Sclerosis

    Get PDF
    The expression of selected microRNAs (miRNAs) known to be involved in the regulation of immune responses was analyzed in 74 patients with relapsing remitting multiple sclerosis (RRMS) and 32 healthy controls. Four miRNAs (miR-326, miR-155, miR-146a, miR-142-3p) were aberrantly expressed in peripheral blood mononuclear cells from RRMS patients compared to controls. Although expression of these selected miRNAs did not differ between treatment-naĆÆve (nā€Š=ā€Š36) and interferon-beta treated RRMS patients (nā€Š=ā€Š18), expression of miR-146a and miR-142-3p was significantly lower in glatiramer acetate (GA) treated RRMS patients (nā€Š=ā€Š20) suggesting that GA, at least in part, restores the expression of deregulated miRNAs in MS

    Improving the split criteria for classification trees and ensemble methods

    No full text
    Ensemble-Verfahren finden aufgrund ihrer guten Vorhersagegenauigkeit in vielen Forschungsbereichen breite Anwendung. Dabei werden die Vorhersagen einzelner statistischer Modelle zu einer Gesamtvorhersage zusammengefasst. Die vorliegende Dissertation befasst sich mit der Fragestellung, ob und inwieweit sich die VorhersagegĆ¼te von Ensemble-Verfahren durch die Optimierung einzelner EntscheidungsbƤume verbessern lƤsst. ZunƤchst wird der klassische CART-Algorithmus (engl. Classification And Regression Trees, CART) vorgestellt. Im Hinblick auf die InstabilitƤt des Modells und die Verzerrung bei der Variablenselektion wird ein verbessertes Split-Kriterium vorgeschlagen, welches den p-Wert einer maximal selektierten Rang-Statistik bei der Suche nach der optimalen Trennung verwendet. AnschlieƟend wird das neue Verfahren hinsichtlich der beiden Probleme in umfangreichen Simulationsstudien untersucht. Da der CART zur Familie der so genannten Greedy-Algorithmen gehƶrt, ist das Erreichen einer global optimalen Lƶsung nicht garantiert. FĆ¼r die Lƶsung dieses Problems wird eine penalisierte globale lookahead-Strategie erarbeitet. Dabei werden zusƤtzlich zu der besten Aufteilung mehrere alternative Aufteilungen bei der Partitionierung der Daten verwendet. Infolgedessen entsteht ein Ensemble aus KlassifikationsbƤumen. Um das beste Modell identifizieren zu kƶnnen, wird ein Beurteilungskriterium der Unsicherheit des Modells aufgestellt. AbschlieƟend wird eine eventuelle Verbesserung der VorhersagegĆ¼te von p-Wert adjustierten Verfahren eingehend in zwei umfangreichen Simulationsstudien untersucht. Die Arbeit schlieƟt mit der praktischen Anwendung der neu vorgeschlagenen Verfahren auf zwei DatensƤtze Ć¼ber kolorektales Karzinom ab.Ensemble methods are popular learning algorithms that typically achieve very high classification accuracy by combining the predictions of a set of classifiers. They have been widely used in many research areas. This thesis addresses the question of whether and to what extent the predictive quality of ensemble methods can be improved by the optimization of individual decision trees. At the beginning, the CART (Classification And Regression Trees) algorithm is presented. In view of well-known variable selection bias and model instability, an improved splitting criterion is introduced. It uses the p-value of a maximum selected rank statistic in the search for the optimal separation. The performance of the proposed method is examined in extensive simulation studies. Since CART algorithm belongs to a family of greedy approximation methods, it does not guarantee that the globally optimal solution is achieved. A penalized global look-ahead strategy is developed to solve this problem. The basic idea behind the proposed approach is to use extra alternative cutpoints in addition to the best cutpoint by the partitioning of the data. In consequence of this, the set of decision trees is created. From this ensemble either the best model is selected, by means of the uncertainty criterion, or models are aggregated for prediction. Finally, the performance of the p-value adjusted methods is validated through two extensive simulation studies. The thesis concludes with the application to colorectal cancer data sets and a discussion and some possibilities for further research

    MINERALOGY OF SALT SEDIMENTS IN WELLS AND IN OTHER OIL-FIELD EQUIPMENT OF DEPOSITS OF WESTERN SIBERIA

    No full text
    There the study objects are the salt sediments in the oil-field equipment. The study purpose is the exploration of phase composition of salt sediments as the mineralogical substantiation of development of avoidance methods and of methods of control over them. The study methods are microscopic, microchemical, infrared-spectroscopical, thermic, laser microspectral, roentgenographical. 15 minerals from 29 in the salt sediments have been ascertained. The causes and the mechanisms of mineral-forming have been considered. The factual material for the development of rational methods of control over the salt sediments has been obtained. The study results have been introduced in six oil-gas-producing organizations. The application field is the oil productionAvailable from VNTIC / VNTIC - Scientific & Technical Information Centre of RussiaSIGLERURussian Federatio

    Health-related use of the Internet in Germany 2007

    No full text
    The European eHealth Trends project analyses the attitudes towards and usage of eHealth applications of European citizens in the time frame 2005ā€“2007. In April/May 2007 the second series of representative stratified surveys with computer-based telephone interviews (CATI) (in Germany based on the German ADM Master Sample) were performed by a poll agency in seven European countries. Here we report the major results for the German population, were 1000 participants with an age between 15 and 80 years were interviewed. For the general use of the Internet for health purposes as well as the established eHealth Internet use (at least once a month) we report a significant increase (from 44.4% to 56.6% and from 22.5% to 32.0%). Further, the percentage of Germans who consider the Internet as an important medium for health purposes increased from 33.7% to 36.8%. In Bavaria, the percentage of established eHealth Internet users was lowest among the German states. The results of our eHealth Trends survey in Germany show a considerable increase of eHealth use within the last 18 months. German physicians need to be prepared for an increasing number of empowered patients, who have searched for information on their health problems in the Internet, but will also demand more enhanced services
    corecore